Bar plots are one the most common chart type out there and come in
several varieties. In the previous lesson, we learned how to make bar
plots and their circular counterparts with {ggplot2}.
In this lesson, we’ll delve into the intricacies of labeling in
ggplot2, focusing on geom_label() and
geom_text() functions from {ggplot2}.
After this lesson, you will be able to:
geom_text() for simple labelsgeom_label() for emphasized
labelsRun the code below to load the packages for the lesson.
We’ll start with geom_text() for simple labeling and
then move to geom_label() for labels with more emphasis. We
will show how to use these geoms on simple bar plots, then we will get
into more details on how to leverage them for stacked bars, Dodged bars,
normalized stacked bars, and circular plots.
First let’s practice using these functions on a simple bar plot made with fake data. Once we cover the fundamentals of the labeling syntax, we will apply these to real epidemiology data.
# Create example data frame
data <- data.frame(
category = c("A", "B", "C"),
count = c(10, 20, 15)
)
# Create the bar plot
ggplot(data, aes(x = category, y = count)) +
geom_col(fill = "steelblue")We can easily add labels to our bars with the
geom_text() function and telling the
aes() function which column to extract
label text from:
ggplot(data, aes(x = category, y = count)) +
geom_col(fill = "steelblue") +
geom_text(aes(label = count)) # provide variable to `label` argumentAs you can see however, the placement of our text is odd – neither on the bar, nor under the bar. Additionally, they are quite small and difficult to make out. We can address this by making them bigger, and vertically adjusting their placement.
To do this, we will nudge the text upwards using the y_nudge argument. We will also increase the size of the text using the size argument.
ggplot(data, aes(x = category, y = count)) +
geom_col(fill = "steelblue") +
geom_text(aes(label = count),
nudge_y = 1,
size = 5) # move text upNote that the value of nudge_y is in the same units as
the y-axis.
Let’s try nudging the text down by setting nudge_y to a
negative value:
ggplot(data, aes(x = category, y = count)) +
geom_col(fill = "steelblue") +
geom_text(aes(label = count),
nudge_y = -3,
size = 5) # move text downIf we made a horizontal bar plot, we would need to nudge the text to
the right or left using the nudge_x argument instead of
nudge_y:
ggplot(data, aes(x = count, y = category)) +
geom_col(fill = "steelblue") +
geom_text(aes(label = count),
nudge_x = 1,
size = 5) # move text to the rightNow let’s see how the geom_label() function works. We
can use the same code as above, but replace geom_text()
with geom_label():
ggplot(data, aes(x = category, y = count)) +
geom_col(fill = "steelblue") +
geom_label(aes(label = count),
nudge_y = -3,
size = 5)As you can see, geom_label() draws a rectangle behind
the text, making it easier to read.
In this code, the fill aesthetic in
geom_label() can be adjusted to control the background fill
color of the labels. For example, let’s make the background dark blue,
and the text white:
ggplot(data, aes(x = category, y = count)) +
geom_col(fill = "steelblue") +
geom_label(aes(label = count),
nudge_y = -3,
fill = "royalblue4",
color = "white",
size = 5)Consider the following sample data frame:
# Create example data frame
district_cases <- data.frame(
district = c("A", "B", "C"),
cases = c(10, 20, 15)
)
district_cases## district cases
## 1 A 10
## 2 B 20
## 3 C 15
Create a labeled bar plot of the data frame above, where the x-axis is the district and the y-axis is the number of cases. The labels should be the number of cases, and should be placed above the bars. The labels should have “darkblue” text with a “lightblue” background. The bar color should be “steelblue”
Setting a custom {ggplot2} theme So far, we’ve added
a theme function to each of our bar plots. Let’s learn how to create our
own custom theme functions, and how to use theme_set()
function to set a global theme for all plots.
We’ll define a custom theme that is a combination of theme_dark and large bold axis labels:
Now we can set use this these for specific plot like this:
ggplot(data, aes(x = category, y = count)) +
geom_col(fill = "steelblue") +
geom_label(aes(label = count),
nudge_y = -3,
fill = "royalblue4",
color = "white") +
theme_dark_customNote the lack of parentheses after
theme_dark_custom.
We can set this theme as the default for all plots:
## Error in eval(expr, envir, enclos): object 'theme_light_custom' not found
Now theme_light_custom() will be automatically applied
to every plot you draw.
For example, let’s redraw the plot we made earlier:
ggplot(data, aes(x = category, y = count)) +
geom_col(fill = "steelblue") +
geom_label(aes(label = count),
nudge_y = -3,
fill = "royalblue4",
color = "white")This is a great way to ensure that all of your plots have a consistent look and feel.
To set the default theme back to the original, use
theme_set(theme_gray()).
vjust
and hjust argumentsRather than use nudge_x and nudge_y, to
adjust the position of text, we can use the vjust and
hjust arguments. These arguments adjust the vertical and
horizontal justification of the text, respectively. It is notoriously
difficult to understand exactly how these work, but we will introduce
their basic functionality here.
hjust (horizontal justification)The hjust argument in ggplot2 adjusts the
horizontal position of text labels relative to their anchor points (the
actual data points). hjust values range from 0 to 1,
where:
hjust = 0 aligns the text label’s left edge with the
anchor point.hjust = 0.5 centers the text label on the anchor
point.hjust = 1 aligns the text label’s right edge with the
anchor point.Here’s a simple example to illustrate this. First, let’s make a plot
with a single point and text with no hjust argument:
# Example data
df <- data.frame(x = 1, y = 1)
# Base plot with a point
base_p <- ggplot(df, aes(x, y)) + geom_point() + theme_void()
base_p + geom_text(aes(label = "text"))With no hjust argument, the text is centered on the
point, which means that the default value of hjust is
0.5.
Now let’s try setting hjust to a variety of values:
p_hjust_0 <- base_p + geom_text(aes(label = "hjust=0"), hjust = 0)
p_hjust_0.25 <- base_p + geom_text(aes(label = "hjust=0.25"), hjust = 0.25)
p_hjust_0.5 <- base_p + geom_text(aes(label = "hjust=0.5"), hjust = 0.5)
p_hjust_0.75 <- base_p + geom_text(aes(label = "hjust=0.75"), hjust = 0.75)
p_hjust_1 <- base_p + geom_text(aes(label = "hjust=1"), hjust = 1)
# Combine plots with patchwork
p_hjust_0 / p_hjust_0.25 / p_hjust_0.5 / p_hjust_0.75 / p_hjust_1As you can see, the text is aligned to the left edge of the point
when hjust = 0, to the right edge of the point when
hjust = 1, and moves closer to the center as
hjust approaches 0.5.
While hjust was originally meant to be used between 0
and 1, you can actually use any value for hjust, above or
below 0 and 1. For example, if you set hjust = -0.2, the
text will be left-aligned, but with an additional 20% of the text width
added to the left of the anchor point, and if you set
hjust = 1.2, the text will be right-aligned, but with an
additional 20% of the text width added to the right of the anchor
point:
p_hjust_neg0.5 <- base_p + geom_text(aes(label = "hjust=-0.5"), hjust = -0.5)
p_hjust_neg0.2 <- base_p + geom_text(aes(label = "hjust=-0.2"), hjust = -0.2)
p_hjust_1.2 <- base_p + geom_text(aes(label = "hjust=1.2"), hjust = 1.2)
p_hjust_1.5 <- base_p + geom_text(aes(label = "hjust=1.5"), hjust = 1.5)
# Combine plots with patchwork
p_hjust_neg0.5 / p_hjust_neg0.2 / p_hjust_0 / p_hjust_0.25 / p_hjust_0.5 / p_hjust_0.75 / p_hjust_1 / p_hjust_1.2 / p_hjust_1.5Consider the following horizontal bar plot with text labels added:
# sample data
sample_gender <-
data.frame(gender = c("F", "M"),
n = c(10, 20))
ggplot(sample_gender, aes(x = n, y = gender)) +
geom_col() +
geom_text(aes(label = n), color = "white")Use the hjust or vjust arguments to adjust
the position of the text label so that it is inside the bar, with some
padding on the right side.
Using hjust and vjust values outside the
0-1 range can be problematic when your labels are not the same length.
For example, if you have labels of different lengths, setting
hjust = 1.2 will cause the longer labels to extend further
to the right than the shorter labels.
For example:
# Different text labels with varying lengths
p_xx <- base_p + geom_text(aes(label = "xxx"), hjust = 1.5)
p_xxxx <- base_p + geom_text(aes(label = "xxxxxx"), hjust = 1.5)
p_xxxxxx <- base_p + geom_text(aes(label = "xxxxxxxxx"), hjust = 1.5)
# Combine plots with patchwork
p_xx / p_xxxx / p_xxxxxxAs you can see, the longer labels have more extra space added to the
right of the anchor point than the shorter labels. This is because
hjust is adding 50% of the text width to the right of the
anchor point, so longer labels get more padding.
If this is a problem for you, you can use the nudge_x
argument to adjust the position of the labels instead. There
are certain times when using nudges can be problematic though,
which is why hjust and vjust are still
useful.
vjust (vertical justification)Similarly, the vjust argument in ggplot2
adjusts the vertical position of text labels in relation to their anchor
points. vjust values also range from 0 to 1, where:
vjust = 0 aligns the bottom edge of the text label with
the anchor point.vjust = 0.5 centers the text label vertically on the
anchor point.vjust = 1 aligns the top edge of the text label with
the anchor point.Here’s an example to illustrate vjust. We’ll start with
the same base plot and add text with no vjust argument:
# Base plot with a point
p <- ggplot(df, aes(x, y)) + geom_point() + theme_void()
p + geom_text(aes(label = "text"))By default, with no vjust specified, the text is
vertically centered on the point, indicating the default value of
vjust is 0.5.
Now, let’s experiment with different vjust values:
p_vjust_0 <- p + geom_text(aes(label = "vjust=0"), vjust = 0)
p_vjust_0.25 <- p + geom_text(aes(label = "vjust=0.25"), vjust = 0.25)
p_vjust_0.5 <- p + geom_text(aes(label = "vjust=0.5"), vjust = 0.5)
p_vjust_0.75 <- p + geom_text(aes(label = "vjust=0.75"), vjust = 0.75)
p_vjust_1 <- p + geom_text(aes(label = "vjust=1"), vjust = 1)
# Combine plots with patchwork
p_vjust_0 / p_vjust_0.25 / p_vjust_0.5 / p_vjust_0.75 / p_vjust_1Here, vjust = 0 aligns the text to the bottom of the
point, vjust = 1 aligns it to the top, and as
vjust approaches 0.5, the text moves closer to the vertical
center.
Like hjust, vjust can also take values
outside the 0 to 1 range. For example, vjust = -0.2 would
place the text slightly below the anchor point, and
vjust = 1.2 would place it slightly above. Let’s see how
these values affect text positioning:
p_vjust_neg0.5 <- p + geom_text(aes(label = "vjust=-0.5"), vjust = -0.5)
p_vjust_1.5 <- p + geom_text(aes(label = "vjust=1.5"), vjust = 1.5)
# Combine plots with patchwork
p_vjust_neg0.5 / p_vjust_0 / p_vjust_0.25 / p_vjust_0.5 / p_vjust_0.75 / p_vjust_1 / p_vjust_1.5As with hjust, using vjust values beyond
the typical 0 to 1 range can be useful for fine-tuning the placement of
your text labels, allowing them to extend slightly above or below the
anchor point.
Consider the following bar plot with text labels added:
# sample data
sample_gender <-
data.frame(gender = c("F", "M"),
n = c(10, 20))
ggplot(sample_gender,
aes(x = gender, y = n)) +
geom_col() +
geom_text(aes(label = n), color = "white")Use the hjust or vjust arguments to adjust
the position of the text label so that it is inside the bar, with some
padding on the top.
Let’s apply what we’ve learned to a real dataset.
The tb_outcomes dataset, which we used in the previous
lesson, will serve as the foundation for our examples.
We’ll be trying to plot the number of cases per hospital.
Unlike with our initial practice dataset, we do not already have the
total number of cases per hospital; this information is stored in the
cases column, but we need to summarize it first.
Let’s calculate the total number of cases per hospital using the
group_by() and summarize() function:
## # A tibble: 6 × 2
## hospital cases
## <chr> <dbl>
## 1 CHPP Akron 875
## 2 CS Abomey-Calavi 791
## 3 Hopital Bethesda 256
## 4 Hopital Savalou 80
## 5 Hopital St Luc 168
## 6 St Jean De Dieu 171
Now let’s use hospital_sums to visualize each hospital’s
total number of cases and use geom_text() to annotate the
bars:
ggplot(hospital_sums, aes(x = hospital, y = cases)) +
geom_col(fill = "steelblue") +
geom_text(aes(label = cases),
vjust = -0.2)Great, now you see how to use the summarize() function
to calculate group totals, and how to use geom_text() to
annotate your plots.
Consider the aus_tb_notifs dataset imported below, which
shows the number of TB cases in urban and rural areas per quarter:
## # A tibble: 52 × 4
## year quarter rural urban
## <dbl> <chr> <dbl> <dbl>
## 1 2010 Q1 4 87
## 2 2010 Q2 4 98
## 3 2010 Q3 5 101
## 4 2010 Q4 10 124
## 5 2011 Q1 5 81
## 6 2011 Q2 4 52
## 7 2011 Q3 9 102
## 8 2011 Q4 5 100
## 9 2012 Q1 9 80
## 10 2012 Q2 4 63
## # ℹ 42 more rows
Create a simple bar plot to visualize the total number of TB cases in
urban areas for each year. Label each bar with the
total number of cases using geom_text() just below the
bar.
Hint: First, aggregate the data by year and sum up
the urban cases. Then use ggplot() with
geom_col() for the bar plot and geom_text()
for the labels.
Further Aesthetic modifications
So far we have only used some of the possible aesthetics for
geom_text(). The minimum three aesthetics are
x, y, and label. These must be
mapped to a variable defined inside aes().
Additional aesthetics include:
size: the size of the text, in mmangle: the angle of the text, from 0 to 360alpha: the transparency of the text, from 0 to 1color: the color of the textfamily: the font family of the text, such as “sans”,
“serif”, “mono”fontface: the font face of the text, including “plain”,
“bold”, “italic”, “bold.italic”group: a grouping variable for the texthjust: horizontal justification of the textvjust: vertical justification of the textlineheight: the line height of the textnudge_y and nudge_x are also available, but
are not formally considered aesthetics, as they cannot be mapped to a
variable inside aes(), and must be set outside of it.
Here is an example plot with most of these aesthetics set. It’s not a very beautiful plot. Try modifying the code to see how each aesthetic changes the plot:
ggplot(hospital_sums, aes(x = hospital, y = cases)) +
geom_col(fill = "steelblue") +
geom_text(aes(label = paste(cases, "\ncases")),
size = 5,
angle = 0,
alpha = 0.5,
color = "black",
family = "mono",
fontface = "bold",
hjust = 0.5,
vjust = 1,
nudge_y = -10,
lineheight = 0.8) +
theme(axis.text.x = element_text(angle = 90))So far, we’ve only looked at bar plots with a single categorical variable. Let’s build plots with two categorical variables and add labels to each subgroup. We’ll start with stacked bar plots.
We summarize the tb_outcomes dataset by
period_date and diagnosis_type, calculating
the sum of cases (cases) for each group.
# Summarize the data by period and diagnosis type
tb_sum <- tb_outcomes %>%
group_by(period_date, diagnosis_type) %>%
summarise(cases = sum(cases))
tb_sumNow, let’s create a simple stacked bar plot and see how to add labels to it:
# Create a basic bar plot using the summarized data
quarter_dx_bar <- tb_sum %>%
ggplot(aes(x = period_date, y = cases, fill = diagnosis_type)) +
geom_col() +
labs(title = "New and relapse TB cases per quarter",
subtitle = "Data from six health facilities in Benin, 2015-2017")
quarter_dx_barWe’ll use the cases column for labeling each bar:
Oops, the labels are not in the right place! They don’t align with the height of the bars in our plot.
The issue is that geom_text() does not stack positions
by default like geom_col(). We must explicitly set
position = "stack" in geom_text():
# Place text at the top of each bar segment
quarter_dx_bar +
geom_text(aes(label = cases),
position = "stack") # Set position to stackGreat!
To vertically align the text inside the bars, we can add
vjust to geom_text():
# Reposition labels inside the stacks for clarity and change the font style
quarter_dx_bar +
geom_text(aes(label = cases),
position = "stack",
vjust = 1.5)This works well, the labels are now inside the bars, and setting
vjust = 1.5 adds an extra 50% of label height as padding
between the label and the top of the bar.
But what if we want to center the labels vertically within each bar
segment? To do this, we switch from position = "stack" to
the more customizable position_stack() function, and set
vjust = 0.5 within
position_stack():
Now the labels are vertically centered within each bar segment.
This label placement is especially nice for horizontal bar plots.
Below we flip the axes of our plot using coord_flip() to
create a horizontal bar plot, and add some extra aesthetic modifications
to make the plot more readable:
quarter_dx_bar +
geom_text(aes(label = cases),
position = position_stack(vjust = 0.5),
# some extra adjustments
color = "white",
fontface = "bold") +
coord_flip()That looks great! Let’s move on to dodged bar charts now.
Create a stacked bar plot showing the distribution per year of TB
cases in rural and urban areas using the aus_tb_notifs
dataset. Use geom_text() and adjust the position of the
labels for clarity.
Hint: Pivot the data so that area_type
is a column, then summarize the data by year and
area_type, calculating the sum of cases
(cases) for each group. The pivoting is done for you in the
code below.
# Pivot the data
aus_tb_notifs %>%
pivot_longer(cols = c(rural, urban),
names_to = "area_type",
values_to = "cases")## # A tibble: 104 × 4
## year quarter area_type cases
## <dbl> <chr> <chr> <dbl>
## 1 2010 Q1 rural 4
## 2 2010 Q1 urban 87
## 3 2010 Q2 rural 4
## 4 2010 Q2 urban 98
## 5 2010 Q3 rural 5
## 6 2010 Q3 urban 101
## 7 2010 Q4 rural 10
## 8 2010 Q4 urban 124
## 9 2011 Q1 rural 5
## 10 2011 Q1 urban 81
## # ℹ 94 more rows
Dodged bar charts display multiple categories side by side. Let’s explore how to group the data and properly position labels for clear interpretation.
To begin, we’ll group our dataset tb_outcomes by
hospital and diagnosis_type, calculating the
sum of cases (cases) for each group.
hospital_dx_cases <- tb_outcomes %>%
group_by(hospital, diagnosis_type) %>%
summarise(cases = sum(cases))
hospital_dx_casesNext, let’s create a simple dodged bar chart, where
the height of each bar signifies the total number of cases for a
specific diagnosis in each hospital. Since the default parameter for
geom_col is stack, we must explicitly set
position = "dodge" to create a dodged bar chart.
hospital_dx_bar <- hospital_dx_cases %>%
ggplot(aes(x = hospital, y = cases, fill = diagnosis_type)) +
geom_col(position = "dodge")
hospital_dx_barNow, we can annotate the chart with geom_text() to
display the labels, just as we’ve done before.
Oops, that’s not quite right! The labels are vertically centered in a straight line, and they’re not aligned with the bars. Let’s take a look at how we can fix that.
Just as with our stacked bar chart in the previous section, we need
to add the position adjustment to geom_text(). This time
we’re going to specify position = position_dodge().
Oh no. We get the same chart as before. This is because a width
argument is required for position_dodge().
For geom_col(), the default value of width
is 0.9. We’ll also use 0.9 for
geom_text() to ensure the bars and labels are aligned:
Now all that’s left to do is shift the labels up a bit with vjust.
hospital_dx_bar +
geom_text(aes(label = cases),
position = position_dodge(width = 0.9),
vjust = -0.5)That looks great! Next, we’ll move on to percent-stacked bar plots.
Generate a dodged bar plot that displays rural and urban TB cases
side by side for each year using the aus_tb_notifs dataset.
Label each bar using geom_text(), ensuring the labels are
correctly aligned.
You can use the code and comments below as a guide:
# Pivot the data
aus_tb_notifs %>%
pivot_longer(cols = c(rural, urban),
names_to = "area_type",
values_to = "cases")## # A tibble: 104 × 4
## year quarter area_type cases
## <dbl> <chr> <chr> <dbl>
## 1 2010 Q1 rural 4
## 2 2010 Q1 urban 87
## 3 2010 Q2 rural 4
## 4 2010 Q2 urban 98
## 5 2010 Q3 rural 5
## 6 2010 Q3 urban 101
## 7 2010 Q4 rural 10
## 8 2010 Q4 urban 124
## 9 2011 Q1 rural 5
## 10 2011 Q1 urban 81
## # ℹ 94 more rows
When labeling percent-stacked bar plots, the labels should reflect the percentages of each category. This means we need to format the labels into percentages to ensure they match the segments on the chart. By the end of this section, you’ll know how to create a graph like the one below!
To get started, let’s calculate the total number of cases for each
health facility (hospital) by diangostic type.
hosp_dx_sum <- tb_outcomes %>%
group_by(hospital, diagnosis_type) %>%
summarise(total_cases = sum(cases))
hosp_dx_sumWe could use this dataset to create a percent-stacked bar plot. You
may remember from the last lesson that for percent stacked plots, we
need to use the fill position. By now, you should recognize
that we want to use the more customizable position_fill()
instead of the simpler position = "fill". Let’s apply this
position to both the bars and the labels.
hosp_dx_sum %>%
ggplot(aes(x = hospital, y = total_cases, fill = diagnosis_type)) +
geom_col(position = position_fill()) +
geom_text(aes(label = total_cases),
position = position_fill()) This is a good start but it need some improvements. For starters, we want percentages, not raw values.
In order to prepare our data for this, we need to calculate the proportion of cases for each hospital and diagnosis type before we create the plot:
hosp_dx_prop <- tb_outcomes %>%
group_by(hospital, diagnosis_type) %>%
summarise(total_cases = sum(cases)) %>%
mutate(prop = total_cases / sum(total_cases))
hosp_dx_prop## # A tibble: 12 × 4
## # Groups: hospital [6]
## hospital diagnosis_type total_cases prop
## <chr> <chr> <dbl> <dbl>
## 1 CHPP Akron bacteriological 695 0.794
## 2 CHPP Akron clinical 180 0.206
## 3 CS Abomey-Calavi bacteriological 671 0.848
## 4 CS Abomey-Calavi clinical 120 0.152
## 5 Hopital Bethesda bacteriological 139 0.543
## 6 Hopital Bethesda clinical 117 0.457
## 7 Hopital Savalou bacteriological 70 0.875
## 8 Hopital Savalou clinical 10 0.125
## 9 Hopital St Luc bacteriological 149 0.887
## 10 Hopital St Luc clinical 19 0.113
## 11 St Jean De Dieu bacteriological 100 0.585
## 12 St Jean De Dieu clinical 71 0.415
Now we have a proportion column, prop, that we can use
to create our percent-stacked bar plot.
Let’s create a bar chart using our new dataset
hosp_dx_prop with prop as our new y
variable:
hosp_dx_fill <- hosp_dx_prop %>%
ggplot(aes(x = hospital, y = prop, fill = diagnosis_type)) +
geom_col(position = position_fill())
hosp_dx_fillNow, we can use geom_text() and specify the position to
the labels:
It’s a good start, but obviously, we still have some work to do to make it look nicer!
Before adjusting our labels, let’s handle those decimals. We could reduce the number of decimals like this:
However, the better method is this:
The {scales} package is commonly used with
{ggplot2} for customizing aesthetics, transforming axis
scales, formatting labels, defining color palettes, and more.
The scales::percent(prop) function we used in the code
above with geom_text() converts the proportions (values
from our prop variable) into a percentage format and adds
percentage signs. We can also control the number of displayed digits
using the accuracy argument (see below).
Next, we can center the labels using vjust in the
position_fill() function
hosp_dx_fill +
geom_text(aes(label = scales::percent(prop)),
position = position_fill(vjust = 0.5)) # center labelsIt looks great, but we can do better! Using flipped coordinates in bar charts can greatly improve readability:
hosp_dx_fill +
geom_text(aes(label = scales::percent(prop, accuracy = 1)),
position = position_fill(vjust = 0.5)) +
coord_flip() Great, now we can add some additional aesthetic tweaks:
hosp_dx_fill +
geom_text(aes(label = scales::percent(prop, accuracy = 1)),
position = position_fill(vjust = 0.5),
color = "white", # Change text color
fontface = "bold", # Make it bold
size = 4.5) + # Change font size
theme_light() +
coord_flip() Amazing! Let’s move on to our last section where we’ll take a look at circular plots.
Transform the aus_tb_notifs data into a percent-stacked
bar plot, with a bar for each year, and the fill aesthetic mapped to the
area type (rural vs urban).
Label each segment with the percentage of cases using
geom_text(). Format the labels as percentages.
You can use the code and comments below as a guide:
# Pivot the data
aus_tb_notifs %>%
pivot_longer(cols = c(rural, urban),
names_to = "area_type",
values_to = "cases")## # A tibble: 104 × 4
## year quarter area_type cases
## <dbl> <chr> <chr> <dbl>
## 1 2010 Q1 rural 4
## 2 2010 Q1 urban 87
## 3 2010 Q2 rural 4
## 4 2010 Q2 urban 98
## 5 2010 Q3 rural 5
## 6 2010 Q3 urban 101
## 7 2010 Q4 rural 10
## 8 2010 Q4 urban 124
## 9 2011 Q1 rural 5
## 10 2011 Q1 urban 81
## # ℹ 94 more rows
As usual, let’s begin by summarizing the data.
We’ll calculate the total number of cases for each hospital by
grouping the data based on the hospital variable and then
calculating the sum of cases in each group.
total_results <- tb_outcomes %>%
group_by(hospital) %>%
summarise(
total_cases = sum(cases))
total_resultsNow that we have our new dataset, let’s start by creating a simple bar chart. You may recall from the previous lesson that a pie chart is essentially a round version of a 100% stacked bar chart.
results_stack <- ggplot(total_results,
aes(x=4, # Set an arbitrary x value
y=total_cases,
fill=hospital)) +
geom_col()
results_stackNow, we can create our basic pie chart. As we learned in the last
lesson, to transform linear coordinates into polar coordinates, we use
the coord_polar() function. The theta
parameter defines which aesthetic variable should be mapped to the
angular coordinate in the polar coordinate system. By specifying
"y", we use the height of the bars to determine the angle
of each slice in our pie chart.
Great! This will serve as our base pie chart. Next, let’s create a
base donut chart using xlim().
outcome_donut <- ggplot(total_results,
aes(x = 4,
y = total_cases,
fill = hospital)) +
geom_col() +
xlim(c(0, 4.5)) +
coord_polar(theta = "y")
outcome_donutAlright, we’re ready to move on to labelling!
Let’s add labels to our pie chart using geom_text().
You’ll notice that the numbers appear in the wrong segments because
we haven’t added a position adjustment to the labeling
geometry yet.
Now, just as we did previously, we will use the
position_stack() argument with vjust to center
the labels.
outcome_pie +
geom_text(aes(label = total_cases),
position = position_stack(vjust = 0.5)) # Center the labelsWe can do the same with geom_label().
# Similar adjustment with geom_label()
outcome_pie +
geom_label(aes(label = total_cases),
position = position_stack(vjust = 0.5))To remove the letter “a” from the legend, we can use
show.legend = FALSE:
outcome_pie +
geom_label(aes(label = total_cases),
position = position_stack(vjust = 0.5),
show.legend = FALSE)Next, let’s move on to our basic donut chart. We’ll label it using
geom_text():
To finish, we can make some additional aesthetic adjustments. Here,
we enhance the chart’s aesthetics by applying theme_void()
to remove cluttered background elements, introducing a new color palette
with scale_fill_viridis_d(), and adjusting the text labels
using geom_text() with white and bold text for better
visibility and contrast.
# Additional aesthetic modifications
outcome_donut +
geom_text(aes(label = total_cases),
position = position_stack(vjust = 0.5),
color = "white",
fontface = "bold") +
theme_void() +
scale_fill_viridis_d()Congratulations, it looks great!
Plot total TB cases in all rural vs urban areas in the
aus_tb_notifs dataset as a pie chart. Use
geom_text() to place labels correctly, indicating the
number of cases in that area.
You can use the code and comments below as a guide:
# Pivot then summarize the total cases per area type
aus_tb_notifs %>%
pivot_longer(cols = c(rural, urban),
names_to = "area_type",
values_to = "cases") %>%
group_by(area_type) %>%
summarise(total_cases = sum(cases)) ## # A tibble: 2 × 2
## area_type total_cases
## <chr> <dbl>
## 1 rural 394
## 2 urban 4981
Pro-Tip: Enhancing Text Labels with ggtext
For advanced plotters seeking even more sophisticated control over text formatting in ggplot2, the {ggtext} package may come in handy. It allows the use of CSS to precisely format text elements, including options to embolden, italicize, change color and size, add superscripts/subscripts, and even embed images. Notably, you can apply multiple styles within the same text element, opening up new levels of creativity and customization.
Consider the example below, which uses {ggtext} for the plot title, subtitle and bar labels:
pacman::p_load(tidyverse, ggtext, medicaldata)
# Data and Plot
medicaldata::strep_tb %>%
count(gender) %>%
mutate(gender_label = paste0("**<span style='font-size:16pt'>", n, "</span>**",
if_else(gender == "M", " men", " women"))) %>%
ggplot(aes(x = gender, fill = gender, y = n)) +
geom_col() +
scale_fill_manual(values = c("M" = "#ee6c4d", "F" = "#424874")) +
labs(
title = "<b><span style='color:#424874; font-size:19pt'>Female</span> vs
<span style='color:#ee6c4d; font-size:19pt'>Male</span>
Patients in Strep Study</b>",
subtitle = "<span style='color:gray60'>A demonstration of custom text labels with </span>**{ggtext}**") +
theme_classic() +
theme(plot.title = element_textbox_simple(),
plot.subtitle = element_textbox_simple(),
legend.position = "none",
axis.text.x = element_blank()) +
geom_richtext(aes(label = gender_label, y = n/2),
label.r = grid::unit(5, "pt"), fill = "white")To learn more about {ggtext}, visit the package website.
This lesson covered the use of geom_text() and
geom_label() in ggplot2, demonstrating their application in
various plot types. We learned how to add and adjust text labels for
clarity and effective data presentation, from basic bar plots to more
complex formats like stacked and circular plots. These skills are
essential for enhancing the readability and informative value of
graphical data representations in R.
ggplot(district_cases, aes(x = district, y = cases)) +
geom_col(fill = "steelblue") +
geom_label(aes(label = cases),
nudge_y = 1, # Adjust to position the labels above the bars
fill = "lightblue", # Background color of the labels
color = "darkblue" # Text color
)# Adjusting text position inside the bar
ggplot(sample_gender, aes(x = n, y = gender)) +
geom_col() +
geom_text(aes(label = n), color = "white", hjust = 1.2)# Adjusting text position inside the bar
ggplot(sample_gender,
aes(x = gender, y = n)) +
geom_col() +
geom_text(aes(label = n), color = "white", vjust = 1.2)# aggregate the data by year and sum up the urban cases
urban_cases <-
aus_tb_notifs %>%
group_by(year) %>%
summarize(urban_cases = sum(urban))
# plot the data
ggplot(urban_cases, aes(x = year, y = urban_cases)) +
geom_col(fill = "steelblue") +
geom_text(aes(label = urban_cases),
color = "white",
vjust = 1.2)# Pivot the data
aus_tb_pivoted <- aus_tb_notifs %>%
pivot_longer(cols = c(rural, urban),
names_to = "area_type",
values_to = "cases")
# Summarize the data by year and area type
aus_tb_summarized <- aus_tb_pivoted %>%
group_by(year, area_type) %>%
summarise(cases = sum(cases))
# Create the stacked bar plot
ggplot(aus_tb_summarized, aes(x = year, y = cases, fill = area_type)) +
geom_col() +
geom_text(aes(label = cases),
position = position_stack(vjust = 0.5)) # Pivot the data
aus_tb_pivoted <- aus_tb_notifs %>%
pivot_longer(cols = c(rural, urban),
names_to = "area_type",
values_to = "cases")
# Summarize the data by year and area type
aus_tb_summarized <- aus_tb_pivoted %>%
group_by(year, area_type) %>%
summarise(cases = sum(cases))
# Create the dodged bar plot
ggplot(aus_tb_summarized, aes(x = year, y = cases, fill = area_type)) +
geom_col(position = "dodge") +
geom_text(aes(label = cases),
position = position_dodge(width = 0.9),
vjust = -0.3,
size = 2.8)# Pivot the data
aus_tb_pivoted <- aus_tb_notifs %>%
pivot_longer(cols = c(rural, urban),
names_to = "area_type",
values_to = "cases")
# Summarize and calculate proportions
aus_tb_percent <- aus_tb_pivoted %>%
group_by(year, area_type) %>%
summarise(cases = sum(cases)) %>%
mutate(percentage = cases / sum(cases))
# Create the percent-stacked bar plot
ggplot(aus_tb_percent, aes(x = year, y = percentage, fill = area_type)) +
geom_col(position = "fill") +
geom_text(aes(label = scales::percent(percentage, accuracy = 1)),
position = position_fill(vjust = 0.5),
size = 3,
color = "white",
fontface = "bold")# Summarize the total cases per quarter
aus_tb_rural_urban <- aus_tb_notifs %>%
pivot_longer(cols = c(rural, urban),
names_to = "area_type",
values_to = "cases") %>%
group_by(area_type) %>%
summarise(total_cases = sum(cases))
# Create the pie chart
ggplot(aus_tb_rural_urban, aes(x = "", y = total_cases, fill = area_type)) +
geom_col() +
geom_text(aes(label = total_cases), position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
theme_void() The following team members contributed to this lesson:
Some material in this lesson was adapted from the following sources:
This work is licensed under the Creative Commons Attribution Share Alike license.